Test¶
Title¶
Random Rounding¶
Algorithm¶
Solve \(\operatorname{SDP} (\boldsymbol{W})\) and obtain \(\hat{\boldsymbol{X}}\)
Decompose \(\hat{\boldsymbol{X}} = \hat{\boldsymbol{V}} ^{\top} \boldsymbol{\hat{V}}\), e.g. using EVD \(\boldsymbol{\hat{X}} ^{\top} = \boldsymbol{\hat{U}} \sqrt{\boldsymbol{\hat{\Lambda}}}\), or using Cholesky. Note that \(\left\| \boldsymbol{\hat{v}} _i \right\| =1\) always holds, due to the constraint \(\hat{X}_{ii}=1\).
Sample a direction \(\boldsymbol{r}\) uniformly from \(S^{p-1}\)
Return binary partition assignment \(\hat{\boldsymbol{x}} = \operatorname{sign} (\boldsymbol{\hat{V}} ^{\top} \boldsymbol{r} )\)
Intuition¶
Randomly sample a hyperplane in \(\mathbb{R} ^n\) characterized by vector \(\boldsymbol{r}\). If \(\hat{\boldsymbol{v}} _i\) lies on the same side of the hyperplane with \(\boldsymbol{r}\), then set \(\hat{x}_i =1\), else \(\hat{x}_i = -1\).
If there indeed exists a partition \(I\) and \(J\) of vertices characterizedd by \(\boldsymbol{x}\), then the two groups of directions \(\boldsymbol{v} _i\)’s and \(\boldsymbol{v} _j\)’s should point to opposite direction since \(\boldsymbol{v} _i ^{\top} \boldsymbol{v} _j = x_i x_j = -1\). After random rounding, they should be well separated. Hence, if \(\hat{\boldsymbol{v}}_i ^{\top} \hat{\boldsymbol{v} }_j\) recovers \(\boldsymbol{v}_i ^{* \top} \boldsymbol{v}^* _j\) well enough, then \(\hat{\boldsymbol{x}}\) well recovers \(\boldsymbol{x}^*\), the optimal max-cut in \(\operatorname{cut}(\boldsymbol{W})\).
To see how \(\boldsymbol{V}\) looks like for \(\boldsymbol{x} \in \left\{ \pm 1 \right\} ^n\), let \(\boldsymbol{x} = [1, 1, -1, -1, -1]\), then
Analysis¶
How well the algorithm does, in expectation? We define Geomans-Williams quantity, which is the expected cut value returned by the algorithm, where randomness comes from random direction \(\boldsymbol{r}\).
Obviously \(\operatorname{GW} (\boldsymbol{W}) \le \operatorname{cut} (\boldsymbol{W})\), since we are averaging the value of feasible solutions \(\hat{\boldsymbol{x}}\) in \(\left\{ \pm 1\right\}^n\), and each of them is \(\le \operatorname{cut} (\boldsymbol{W})\). But how small can \(\operatorname{GW} (\boldsymbol{W})\) be?
It can be shown that \(\operatorname{GW}(\boldsymbol{W}) \ge \alpha \operatorname{cut}(\boldsymbol{W})\) where \(\alpha \approx 0.87\). That is, the random rounding algorithm return a cut value not too small than the optimal value, in expectation.
Proof
We randomly sample direction \(\boldsymbol{r}\) from unit sphere \(S^{p-1}\). If \(\boldsymbol{v} _i\) and \(\boldsymbol{v} _j\) lie on different side of hyperplane characterized by \(\boldsymbol{r}\), we call such direction ‘good’.
In \(p=2\) case, we sample from a unit circle. All good \(\boldsymbol{r}\) lie on two arcs on the circle, whose length are related to the angle between \(\boldsymbol{v} _i\) and \(\boldsymbol{v} _j\), denoted \(\theta\). The probability of sampling good equals the ratio between the total length of the two arcs and the circumference. Thus,
In \(p = 3\) case, we sample \(\boldsymbol{r}\) from a unit sphere. All good \(\boldsymbol{r}\) lie on two spherical wedges, with angle \(\theta\). The ratio between the area of each spherical wedge and the area of the sphere is \(\theta/2\pi\). An example is given after the proof.
Now we compare \(\operatorname{GW}(\boldsymbol{W}) = \sum_{i,j}^n w_{ij} \frac{1}{\pi}\arccos (\boldsymbol{v} _i ^{\top} \boldsymbol{v} _j)\) and \(\operatorname{SDP} (\boldsymbol{W}) = \sum_{i,j}^n w_{ij} \frac{1}{2} (1 - \boldsymbol{v} _i ^{\top} \boldsymbol{v} _j)\).
Let’s first see two functions \(f(y) = \frac{1}{\pi}\arccos(y)\) and \(g(y) = \frac{1}{2}(1-y)\) for \(-1 \le y \le 1\). Let \(\alpha = \min_{-1 \le y \le 1} \frac{f(y)}{g(y)}\), then it is easy to find \(\alpha \approx 0.87\).
Fig. 11 Plots of \(f(y)\) and \(g(y)\)¶
Therefore, let \(y_{ij} = \boldsymbol{v} _i \boldsymbol{v} _j\), since \(w_{ij} \ge 0\), we have \(\operatorname{GW} (\boldsymbol{W}) \ge \alpha \operatorname{SDP} (\boldsymbol{W})\). Using the SDP relaxation inequality \(\operatorname{SDP} (\boldsymbol{W}) \ge \operatorname{cut} (\boldsymbol{W})\), we have \(\operatorname{GW} (\boldsymbol{W}) \ge \alpha \operatorname{cut} (\boldsymbol{W})\).
Note that we require \(w_{ij} \ge 0\).
Test plot
import plotly.io as pio
import plotly.express as px
import plotly.offline as py
pio.renderers.default = "notebook"
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species", size="sepal_length")
fig.show()
import numpy as np
import plotly.express as px
import plotly.io as pio
import plotly.offline as py
pio.renderers.default = "notebook"
v = np.array([[1,0,3], [-1,0,3]])
v = v/np.linalg.norm(v, 2, axis=1)[:, np.newaxis]
n = 3000
np.random.seed(1)
x = np.random.normal(size=(n,3))
x = x / np.linalg.norm(x, 2, axis=1)[:, np.newaxis]
x = x[np.dot(x,v[0])*np.dot(x,v[1]) <0, :]
print(f'v1 = {np.round(v[0],3)}, \nv2 = {np.round(v[1],3)}')
print(f'arccos(v1,v2)/pi = {np.round(np.arccos(v[0] @ v[1].T)/np.pi,3)}')
print(f'simulated result = {np.round(len(x)/n,3)}')
fig = px.scatter_3d(x=x[:,0], y=x[:,1], z=x[:,2], size=np.ones(len(x)), range_z=[-1,1])
fig.add_scatter3d(x=[0, v[0,0]], y=[0, v[0,1]], z=[0, v[0,2]], name='v1')
fig.add_scatter3d(x=[0, v[1,0]], y=[0, v[1,1]], z=[0, v[1,2]], name='v2')
fig.show()
v1 = [0.316 0. 0.949],
v2 = [-0.316 0. 0.949]
arccos(v1,v2)/pi = 0.205
simulated result = 0.208
Besides, how large can the SDP relaxation value \(\operatorname{SDP} (\boldsymbol{W})\) be? Grothendieck’s Inequality says
where \(K \approx 1.7\). Hence, the SDP relaxation \(\Omega_{SDP}\) does not relax the original domain \(\Omega\) too much (otherwise we may see \(\operatorname{SDP}(\boldsymbol{W}) \gg \operatorname{cut}(\boldsymbol{W})\)). Hence \(\hat{\boldsymbol{v}}_i ^{\top} \boldsymbol{\hat{v}} _j\) should recover binary \(x_i^* x_j^*\) well.
For SBM¶
The above inequalities applies to any problem instance \(G=(V, E, \boldsymbol{W})\). It may give too generous or useless guarantee for some particular model. Let’s see its performance in stochastic block models.
We work with the mean-shifted matrix \(\boldsymbol{B} = 2\boldsymbol{A} - \boldsymbol{1} \boldsymbol{1} ^{\top}\), where
Essentially, \(\boldsymbol{B}\) just re-codes the connectivity in \(G\) from 1/0 to 1/-1. Note that in SBM, \(\boldsymbol{B}\) is random, depending on parameters \(p\) and \(q\). In the perfect case, if \(p=1, q=0\), then we can tell cluster label \(\boldsymbol{x}\in \left\{ \pm 1 \right\}^n\) directly from \(\boldsymbol{B}\), which can be expressed exactly as \(\boldsymbol{B} = \boldsymbol{x} \boldsymbol{x} ^{\top}\). In general cases, \(\boldsymbol{B}\) cannot be expressed as \(\boldsymbol{x} \boldsymbol{x} ^{\top}\) for some \(\boldsymbol{x} \in \left\{ \pm 1 \right\}^n\). We in turn want to find some \(\boldsymbol{X} = \boldsymbol{x} \boldsymbol{x} ^{\top}\) that is close enough to \(\boldsymbol{B}\), and then use \(\boldsymbol{x}\) as the approximated cluster label.
Similar to the max-cut case, we apply SDP relaxation that drops the rank-1 constraint to \(\boldsymbol{X}\). The SDP problem is then
Note \(\operatorname{tr}\left( \boldsymbol{B} \boldsymbol{X} \right) = \langle \boldsymbol{B} , \boldsymbol{X} \rangle = \sum_{i,j}^n b_{ij} x_{ij}\). Next, we show that the solution to the above problem \(\hat{\boldsymbol{X}}\) is exactly rank-1, even we’ve dropped the rank-1 constraint.
Proof
First we convert it to an equivalent minimization problem
The Lagrangean is
where \(\boldsymbol{z} \in \mathbb{R} ^n\) and \(\boldsymbol{\Lambda} \succeq \boldsymbol{0}\) are dual variables (??). Then
Now we solve the RHS dual problem. For the inner minimization problem \(\min_{\boldsymbol{X}} \mathcal{L} (\boldsymbol{X} ; \boldsymbol{z} , \boldsymbol{\Lambda})\),
Plug this identity to \(\mathcal{L}\) gives the RHS outer maximization problem
.
.
.
.
.
.
.
.